|Poster:||danigalv||Date:||Sep 23, 2020 3:53pm|
|Forum:||audio||Subject:||How are asr.js files created?|
Some colleagues and I began wondering how files are generated, because their timestamps are incredibly accurate from what we've observed.
First of all, it looks like it transcribes only in English. For example, this Spanish audio has a nonsensical English transcript: https://archive.org/details/Montgomery_al_D_a_Episode_151_April_21_2015
1) Can someone describe to me what speech recognizer software was used to generate this file? For example, does it come from some sort of third party speech transcription API? (Like google or IBM or rev.ai). Does it handle out-of-vocabulary words? Note that I am a speech recognition expert, so the more technical detail, the better.
2) We also wonder if there is some sort of documentation or schema for this json file format. Mostly, we have just been looking at it ad-hoc.